Prediction of maladaptation from ecological and genomic data using genomic offsets

Master’s Thesis in Bioinformatics

Curro Campuzano Jiménez

Bioinformatics Research Center at Aarhus University

June 24, 2024

Genomic offsets

A set of statistical tools that predict the maladaptation of populations to rapid environmental change based on genotypes \(\times\)environment association models

Figure 1

Outline

  1. Overview of the Gain and collaborators (2023) model
  2. Methodological analysis with general simulations
    • Comparison of different methods under different scenarios

    • Identifying putatively adaptive loci

    • Measuring uncertainty

  3. Case study: Mediterranean thyme

An explicit model of genomic offsets by Gain and collaborators

Figure 2

Gaussian stabilizing selection

\[ w(z| \mathbf{x}^*) = \exp\left(\frac{-\left(z - z_{\text{opt}}(\mathbf x ^*) \right)^2}{2V_S}\right) \]

Fit a genotype \(\times\) environment association model

Figure 3

We have to assume all individuals are within their adaptive optimum and we can measure the QTLs!

With a bit of math rearranging …

\[ G^2(\mathbf{x}, \mathbf{x}^*) = \frac{\left(\sum _l ^L \hat y_l(\mathbf x) - \hat y_l(\mathbf x^*)\right)^2}{L} \]

Under Gaussian stabilizing selection we would find that a relationship between the genomic offset and shifted fitness:

\[ \mathbb E[-\log (w(\mathbf{x}, \mathbf{x}^*)] \approx \frac{a^2\mathcal G^2(\mathbf{x}, \mathbf{x}^*)}{2V_s} \]

Genomic offsets in a nutshell

  1. Sample locally optimal individuals and measure their genotypes and current environment
  2. Identify a set of putatively adaptive loci using hypothesis testing
  3. Fit a genotype \(\times\) environment association statistical model
  4. Calculate genomic offset between current and shifted environment
  5. Rank individuals / populations

Results

Simulated data using SLiM

Figure 4

Data

Genotype matrix Current environmental matrix Shifted environmental matrix

Figure 5

Different local and non-local adaptation scenarios

Negative log shifted tness 0.00.10.20.3 0.00.51.0 Empirical geometric genomic oset Non-local adaptation Negative log shifted tness 0.00.51.0 Locally adapted 0.00.10.20.3 0.00.51.0 Locally optimal 0.00.10.20.3 Empirical geometric genomic oset

No differences between methods if using the same set of candidates

0.000.250.500.751.00 GeometricGradient forestRDARONA Genomic oset Adjusted R squared Scenario Locally optimal Locally adapted Non local adaptation

Figure 6

How to identify the putatively adaptive loci?

  • Hypothesis testing approach based on genotype\(\times\) environment association model

  • Other options?

Minimize false negatives!

CausalEmpiricalAll SNPsIncomplete (5+5 QTLs)Missing secondary trait (5+0 QTLs)Missing primary trait (0+5 QTLs) 0.00.20.40.6 Adjusted R squared Weak asymmetryStrong asymmetry Increase false positives Increase false negatives

Figure 7: Weak and strong asymmetry refer to the difference in relative importance of the two adaptive phenotypes.

What about bootstrapping to measure uncertainty?

Figure 8

What about bootstrapping to measure uncertainty?

Figure 9

Representative case: Mediterranean thyme

Freezing-tolerant ecotype(non-phenolic monoterpenes)Drough-tolerant ecotype(phenolic monoterpenes)Decrease in frequency of freezing events

Representative case: Mediterranean thyme

Modest predictive power of shifted fitness

-0.10.00.10.2 0.00.51.01.52.0 Adjusted R squared Fast environmental change(2.5 pseudo generations) -0.10.00.10.2 0.02.55.07.5 Intermediate environmental change(10 pseudo generations) -0.10.00.10.2 01020304050 Sampling timepoint (pseudogenerations since climate change started) Slow environmental change(50 pseudo generations) Causal geometric genomic osetEmpirical geometric genomic oset

Figure 10

Future work

  1. Explicitly model two or more latent phenotypic trait
  2. Explore a larger space of hyperparameters in the simulations
  3. Consider different approaches for identifying putatively adaptive loci
  4. Further study the bootstrapping approach
  5. Extend thyme simulations
  6. Analyze real data!

Take home message

  • Genomic offsets need to be supported by external evidence of populations being locally optimal
  • Measuring the uncertainty of your estimates using bootstrapped ranked genomic offsets is a promising strategy
  • Simulate specific data to show the method could work in theory, or if it will fail (non-continuous phenotypes, migration …)

Thanks!

Extra slides

Alternative genomic offsets

Figure 11

Conceptual issues on genomic offsets

Locally optimal versus locally adapted

  • Assumptions have been stated vaguely before
  • I argue that methods assume sampled individuals are within their adaptive optimum
  • Results of Gain and collaborators hold if latent phenotype has constant variance

Conceptual issues in genomic offset framework

Figure 12

Distribution of genomic offsets

(A) Causal (B) Empirical 0.00.10.20.3 0.00.10.20.3 Non local adaptationLocal-foreignHome-away & local-foreign Geometric genomic oset Scenario (A) Causal(B) Empirical

Figure 13

Different local and non-local adaptation scenarios

Locally adapted (local-foreign criteria Locally optimal (home-away and local-foreign criteria Non-local adaptation

Figure 14

Hypothesis-testing prevents spurious inference when phenotype has no genetic basis

Figure 15

Lind and collaborators found that randomly selected were as good as putatively adaptive loci.

Selecting environmental variables

  • Model building and variable selection?
  • Prior knowledge?

Genomic offsets are robust to uncorrelated environmental variables

0.000.250.500.751.00 0204060 Number of added uncorrelated environmental factors Adjusted R squared Number of QTLs 2050100

Figure 16

Be cautious and look for potential confounded variables!

Figure 17

Spearman’s correlation of bootstrapped genomic offsets

RDA RONA Geometric Gradient Forest 0.000.250.500.751.00 0.000.250.500.751.00 Spearman correlation 95% condence interval Random seeds Genomic oset CausalEmpirical

Figure 18

Figure 19

A reasonably comprehensive Julia package

Figure 20

Runtimes

GeometricGradient ForestRDARONA 1030100300 100 bootstrap iterations runtime (seconds)

Figure 21

Numerical issues

λ=1e-5 λ=1e2 0200040006000 0200040006000 0.000.050.100.15 Number of random alleles added to the genotype matrix Adjusted R squared Latent factors (K) 123

Figure 22

Near-zero predictive power of average future fitness

-0.10.00.10.20.3 0.00.51.01.52.0 Adjusted R squared Fast environmental change(2.5 pseudo generations) -0.10.00.10.20.3 0.02.55.07.5 Intermediate environmental change(10 pseudo generations) -0.10.00.10.20.3 01020304050 Sampling timepoint (pseudogenerations since climate change started) Slow environmental change(50 pseudo generations) Causal geometric genomic osetEmpirical geometric genomic oset
Figure 23